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Relevance  vector  machine,  a  sparse  probabilistic  learning  machine  based  on  the  kernel  function,  has  excellent 
ability  of  prediction  and  generalization.  It  is  proposed  by  this  paper  that  the  optimized  relevance  vector 
machine  (ORVM)  is  a  wind  power  interval  forecasting  model  which  is  able  to  provide  a  certain  prediction  value 
and  its  possible  fluctuation  range  at  a  given  confidence  level.  The  proposed  model  characterizes  in  insufficient 
sample  training  and  uncertainty  analysis  and  is  greatly  suitable  to  most  of  wind  farms  in  China  (newly  built  or 
large  scale  wind  farms).  First,  a  grouping  mechanism  has  been  used  to  divide  wind  turbines  into  several  groups 
to  establish  forecasting  model  separately.  Second,  a  selection  method  properly  taking  the  characteristics  of 
NWP  error  distribution  into  consideration  was  presented  to  improve  forecasting  accuracy  of  each  group.  Third, 
the  parameters  of  the  kernel  function  and  initial  value  of  iteration  are  determined  by  particle  swarm 
optimization  to  further  enhance  forecasting  accuracy.  Two  wind  farms  in  China  are  involved  in  the  process  of 
primary  data  collection.  The  performance  data  obtained  from  ORVM  models  are  tested  against  the  predicted 
data  generated  by  GA-ANN  and  SVM.  Results  show  that  the  proposed  model  has  better  prediction  accuracy, 
wider  application  scope  and  more  efficient  calculation. 

©  2013  Elsevier  Ltd.  All  rights  reserved. 
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1.  Introduction 

As  deterioration  of  environment,  exhaustion  of  fossil-fuel  and 
increasing  demand  for  electricity,  wind  power  has  been  attracting 
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significant  attention  in  many  countries  worldwide.  In  fact,  wind 
power  is  the  fastest  growing  source  of  renewable  energy  [1  . 
However,  variable  nature  of  wind  energy  will  possibly  put  the 
reliability,  stability  and  power  quality  of  the  electricity  power 
system  at  risk  [2  . 

One  of  the  most  essential  measures  to  mitigate  serious 
influence  from  integration  of  wind  farms  is  the  short-term  wind 
power  prediction.  Accurate  and  reliable  wind  power  prediction 
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Nomenclature 

NWP 

Numerical  weather  prediction 

GA 

Genetic  algorithm 

ANN 

Artificial  neural  network 

PSO 

Particle  swarm  optimization 

SVM 

Support  vector  machine 

GA-ANN 

ANN  optimized  by  genetic  algorithm 

RVM 

Relevance  vector  machine 

SOFM 

Self-organizing  feature  mapping 

ORVM 

Optimized  relevance  vector  machine 

WT 

Wind  turbine 

allows:  (1)  reasonable  maintenance  schedules  to  be  established 
and  minimum  spinning  reserve  capacities  to  be  determined  so  as 
to  reduce  operating  costs;  (2)  proportion  of  wind  power  in  the 
electric  system  to  be  increased;  (3)  competitiveness  of  wind  power 
companies  to  be  improved  in  competitive  bidding  markets  [3,4]. 

There  are  many  commonly  used  wind  power  prediction  methods 
including  the  physical  methods  [43,44]  (analytical  method  and  CFD 
method)  and  the  statistical  methods  (such  as  artificial  neural  network 
[5-8],  fuzzy  logic  [34,40],  the  Cao  algorithm  [38  ,  support  vector 
machine  [9-11]  and  ensemble  method  39  ).  And  the  above  prediction 
models  have  been  applied  for  facilitating  the  economical  maintenance 
schedules,  competitive  bidding  markets  [3,4]  and  unit  commitment 
[41,42].  However,  they  suffer  some  disadvantages  as  well.  As  for  the 
physic  method,  the  analytical  method  is  hard  to  meet  the  precision 
requirement,  and  the  key  problem  for  CFD  method  is  the  computa¬ 
tional  burden.  Among  the  statistical  methods,  the  most  widely  used 
ones  are  ANN  in  terms  of  its  generalization  ability  of  prediction.  Since 
ANN  can  theoretically  approximate  any  nonlinear  continuous  function, 
it  has  been  successfully  applied  to  the  wind  power  prediction. 
However,  the  performance  of  the  ANN  based  model  is  sensitive  to 
the  size  of  training  samples  [37]  and  only  minimizing  the  training 
error  of  a  neural  network  may  lead  to  over-fitting  problem  12].  The 
consequence  is  that  for  the  known  inputs  prediction  error  is  very  small 
while  for  the  unknown  inputs  out  of  samples  the  prediction  error 
surges.  And  this  is  termed  as  limited  generalization  [13].  To  remedy 
this  problem,  to  increase  the  number  of  training  samples  is  one  way  of 
improving  the  generalization  performance  of  ANN.  However,  this 
demands  for  large  amount  of  training  samples  which  in  turn  limits 
the  application  of  ANN  model.  For  instance,  it  is  difficult  for  newly- 
built  wind  farms  having  insufficient  historical  data  to  build  prediction 
model  because  of  their  short  running  time.  Furthermore,  the  shortage 
of  historical  data  will  undoubtedly  increase  the  difficulty  of  training 
prediction  model  according  to  weather  variation  of  different  months 
or  different  seasons. 

For  the  purpose  of  enhancing  generalization  ability  without 
size  requirement  of  training  samples,  a  statistical  learning  tech¬ 
nology  SVM  has  been  applied  in  wind  power  prediction.  The  SVM 
employs  a  linear  function  in  high-dimensional  feature  space  as 
hypothesis  space  and  makes  good  predictions  using  small  training 
samples.  This  is  a  highly  effective  mechanism  for  avoiding  over¬ 
fitting.  However,  despite  its  success,  we  can  identify  some  sig¬ 
nificant  and  practical  disadvantages  of  SVM  14]: 

(1)  The  kernel  function  must  satisfy  Mercer's  condition.  That  is,  it 
must  be  the  continuous  symmetric  kernel  of  a  positive  integral 
operator; 

(2)  Only  a  single  point  estimate  can  be  achieved  without  any 
uncertainty  information; 

(3)  Although  relatively  sparse,  the  number  of  support  vectors 
grows  linearly  along  with  the  increase  of  training  samples 
size,  which  increases  the  computational  complexity; 

(4)  It  is  necessary  to  estimate  some  insensitive  parameters  which 
generally  entail  extra  calculation  and  setting  of  parameters. 

To  overcome  above  drawbacks,  a  probabilistic  learning  frame¬ 
work  termed  relevance  vector  machine  (RVM)  has  been  originally 


introduced  by  Tipping  [14].  RVM  is  a  nonlinear  pattern  recognition 
model  with  simple  structure  based  on  Bayesian  Theory  and 
Marginal  Likelihood.  The  key  feature  of  RVM  is  that  as  well  as 
offering  excellent  performance  of  prediction  and  generalization,  it 
improves  the  inadequacy  of  SVM  15-17].  Therefore,  this  approach 
has  been  successfully  applied  in  many  fields,  such  as:  load 
forecasting,  fault  classification  [18-21],  but  has  not  yet  been 
applied  to  wind  power  prediction. 

In  addition  to  advanced  mathematics,  different  strategies  have 
been  developed  to  improve  the  forecasts  accuracy.  Many  research¬ 
ers  proved  the  existence  of  smoothing  effect  which  means  that  the 
overall  wind  power  fluctuations  would  decrease  because  of  the 
offset  of  different  wind  resources  in  a  large  area  [24,25].  This  effect 
would  grow  with  the  increase  of  wind  farm  area  and  could  be 
employed  to  manage  the  electricity  quality  and  fluctuations  of 
wind  farm  output  [26-30].  In  fact,  this  effect  would  happen  when 
forecasts  the  power  output  of  a  single  wind  farm,  because  the 
forecast  errors  of  each  wind  turbine  which  locates  at  different  sites 
could  offset  with  each  other  and  then  reduce  the  whole  forecast 
error  of  wind  farm  [31,32].  However,  the  computational  costs 
would  surge  if  the  output  of  each  wind  turbine  is  predicted, 
especially  when  NWP  of  each  wind  turbine  site  is  needed.  There¬ 
fore,  a  grouping  method  has  been  developed  to  divide  wind 
turbines  into  several  groups  considering  the  factors  of  wind  speed 
correlation,  wind  power  correlation,  wind  turbine  sites.  The 
forecasting  model  is  established  for  each  group  and  the  forecasts 
of  wind  farm  output  is  derived  by  adding  results  of  each  group 
together. 

Considering  smoothing  effects,  features  of  numerical  weather 
prediction  (NWP)  error  and  historical  data  of  wind  farms  and 
nature  of  RVM  mentioned  above,  it  is  not  suitable  to  apply  a  raw 
RVM  model  directly  in  wind  power  prediction.  Therefore,  an 
optimized  relevance  vector  machine  based  method  (ORVM),  a 
combination  of  RVM,  grouping  method  and  selection  method  for 
training  samples  and  particle  swarm  optimization  (PSO)  are  used 
to  predict  wind  farm  output  of  each  month. 

In  this  paper,  Section  2  describes  the  theory  of  RVM.  Section  3 
describes  ORVM  wind  power  forecasting  model.  To  verify  the 
effectiveness  and  superiority  of  the  ORVM  model,  Section  4 
presents  a  case  study  with  two  wind  farms  comparing  the 
performance  of  ORVM,  SVM  and  GA-ANN.  Section  5  includes  the 
final  conclusions. 


2.  Theory  of  relevance  vector  machine  [14] 

Given  a  set  of  input-target  pairs  {xn,tn}JJ=1,  assume  that 
ti  =  y(xf;  w)  +  ei.  £i  is  assumed  to  be  mean-zero  Gaussian  with 
variance  a2,  that  is  N(0,cr2).  Kernel  function  K(x,x,)  has  been 
considered  in  RVM  which  makes  prediction  by  the  function: 

M 

y(x;  m)  =  WT0(X)  =  2  WjK(x,  x,)  +  w0  (1 ) 

i  =  1 

where,  0(x)  is  vector  of  basis  function;  w  =  (w1}  w2, wM)  is 
weights  vector. 
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Therefore,  the  probabilistic  formulation  of  RVM  Model  is 
defined  as 


p(tn\x)  =  N(tn\y(xn),<72)  (2) 

where,  N  represents  a  Gaussian  distribution  over  t„  with  mean  of 
y(xn)  and  variance  a2.  And  the  definition  of  y(xn)  is  the  same  as 
function  (1). 

The  likelihood  function  of  whole  samples  is  defined  as  follow: 

p(t|w,  cr2)  =  (2^2)-N/2e{-(1^2)l|t-0W||2}  (3) 

To  overcome  over-fitting  from  implement  of  maximum- 

likelihood  estimation  for  w  and  cr2,  constraint  on  weights  w,- 
was  imposed,  that  is  ‘prior’  probability  distribution  as  follow: 

p(W\a)=  n  N(Wf|0,aj"i)  (4) 

1  =  0 


where,  a  is  N+l  vector  termed  ‘hyperparameters’. 

The  posterior  over  unknown  samples  could  be  obtained  from 
proceeds  of  Bayes  inference. 


p(W,a,a2\t) 


p(t|  W,  a,  g2)  x  p(w,  a,  a2) 

W) 


Assuming  that  new  test  target  is  t*,  new  test  input  x *  are 
used  to  make  predictions.  Then,  predictive  distribution  can  be 
written  as: 


p(t*|t)  =  j  p(t*\W,a,G2)  p(W,a,G2\t)dwdadG2  (6) 

Posterior  distribution  over  weighs  could  be  consequently 
rewritten  as: 


p(w\t,a,<r2) 


p(t\W,a2)  x  p(w\a) 
P(t\a,  g2) 


Therefore,  learning  process  of  RVM  becomes  the  search  for 
a  and  g2  which  makes  the  maximization  of  p(a,G2\t)cxp(t\a,G 2) 
p(a)p(cr2)  by  maximum  marginal  likelihood  estimation  methods 


P(t|a,cr2)  =  J  p(t|w,cr2)p(w|a)dw 

=  \a 2/  +  0A-10Tr(1/2>  exp  |-i t\a2I  +  0A~' 0T)~h\ 

(8) 

It  can  compute  a ,  g2  by  equating  the  differentiation  over  a,  g2 
of  function  (8)  to  zero.  When  a{  approaches  extremely  large,  w* 
goes  to  zero  because  of  constrain  by  the  prior.  For  wt-  interrelated 
with  small  ait  it  fits  sample  data  better.  Iteration  would  be 
preceded  until  the  convergence  condition  is  fulfilled.  During  the 
process  of  parameters  estimation,  most  of  a,  ^oo,  where  corre¬ 
sponding  w,  =  0.  It  leads  to  non-participation  of  prediction  calcu¬ 
lation  for  many  terms  of  kernel  matrix.  This  is  why  RVM  could 
achieve  sparsity. 

Iterative  estimation  of  hyperparameters  proceeds  to  make 
predictions  based  on  each  weigh  of  posterior  distribution  which 
adjusts  to  maximizing  values  aMp,  er2^.  With  new  inputs  x*,  pre¬ 
dictive  results  could  be  described  as  follow: 


P(t*\t,aMP,  cr 


2 

MP 


p(t*|w,  a2MP)p(w\t,aMp,  <72MP)dw  =  N(ta..\ylt,  <rj) 


where, 


y*=/0(x*) 


(10) 


cr2  —  cr^p  +  0(X^)^  Z  0(X*) 


(11) 


3.  RVM-based  wind  power  interval  forecasting  model 

3.2.  Model  structure 

Structure  of  the  ORVM  wind  power  grouping  forecast  model  is 
illustrated  in  Fig.  1,  composed  of  a  grouping  engine  using  SOFM 
method,  a  selection  engine  for  training  samples  and  a  PSO  engine 
for  optimizing  parameters  of  kernel  function  and  a  RVM  engine  for 
forecasting  and  its  confidential  interval. 


3.2.  Grouping  of  wind  turbines 

Wind  power  production  is  directly  correlated  to  the  wind  speed 
through  a  power  curve.  Other  atmospheric  factors,  such  as  wind 
direction,  pressure,  temperature,  relative  humidity,  also  have 
impact  on  the  actual  power  output  [22,23].  In  this  paper,  single 
NWP  results  are  adopted  as  inputs  of  ORVM  prediction  model,  and 
the  reference  site  of  this  single  NWP  is  at  met  mast.  Although  NWP 
has  already  well  applied  in  wind  power  forecasting,  there  are  also 
some  disadvantages  such  as  representativeness  of  single  reference 
site,  huge  computation,  low  computational  resolution  etc  [33-35]. 
Current  practice  is  to  calculate  NWP  at  met  mast  to  represent  the 
wind  profile  of  the  whole  wind  farm  which  impacts  the  accuracy 
of  wind  power  forecasting.  Especially  with  the  increase  of 
wind  farm  size,  the  representativeness  of  a  single  met  mast  is 


Fig.  1.  Structure  of  the  ORVM  grouping  forecast  model. 


Fig.  2.  Structure  of  the  grouping  engine. 
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accordingly  weakening.  The  more  the  reference  sites  are  selected 
to  predict  the  weather,  the  higher  the  forecasting  accuracy  might 
be  achieved  but  the  larger  computational  burden  would  sequen¬ 
tially  bring  in.  To  balance  the  practical  application  and  forecasting 
accuracy,  a  grouping  model  has  been  proposed  based  on  a 
clustering  method— SOFM  [36]  to  identify  similarity  of  wind  speed, 
wind  turbine  generating  characteristic,  wind  turbine  site.  As 
shown  in  Fig.  1,  grouping  engine  would  divide  wind  turbines  into 
n  groups  and  every  group  would  have  its  own  reference  site  for 
NWP.  Based  on  NWP  of  each  group,  the  following  engines  would 
subsequently  continue  to  conduct.  Note  that,  the  wind  turbine 
which  suffers  less  influence  from  wake  effect,  topography  in  the 
same  group  would  be  selected  as  reference  site  of  NWP.  For 
example,  the  wind  turbine  locates  at  higher  altitude  or  locates 
relatively  far  from  other  wind  turbines. 

The  structure  of  grouping  engine  is  shown  in  Fig.  2.  Wind  speed 
correlations,  wind  power  correlation  of  each  wind  turbine  help  the 
engine  put  the  correlated  wind  turbines  into  the  same  group. 
Meanwhile  site  of  each  wind  turbine  brings  the  geographic 
similarity  into  the  grouping  engine.  After  grouping,  the  represen¬ 
tativeness  of  wind  turbine  in  its  own  group  would  be  evaluated  by 
altitude  and  distance  to  wind  farm  border.  To  take  a  wind  farm  in 
north  China  (named  as  WF1)  as  example,  the  results  of  grouping 
engine  and  NWP  reference  point  for  each  group  are  shown  in 
Table  1. 

3.3.  Selection  of  training  samples 

Figs.  3  and  4  present  accuracy  of  wind  speed  from  NWP  whose 
the  reference  site  is  at  met  mast  in  two  wind  farms  (WF1  and 
WF2).  Observe  that  correlation  coefficient  and  root  mean  square 
error  of  NWP  wind  speed  forecasts  are  not  stable  and  fluctuate 
with  the  changes  of  months  or  seasons.  In  Fig.  3,  forecasts  accuracy 
of  NWP  normally  peaks  in  winter  (Dec.-Mar.),  drops  to  lowest  in 
spring  (Apr. -June),  following  a  modest  rise  in  summer  and  fall 
(July-Nov.),  but  even  not  reaching  the  maximum  in  winter.  In 
Fig.  4,  the  NWP  accuracy  shows  very  similar  features:  the  accuracy 
peaks  in  winter  and  summer  and  drops  to  its  lowest  in  autumn 
and  spring.  That  is  because  that  there  are  some  months  or  seasons 
whose  weather  changes  are  relatively  stable  and  display  strong 
regularity.  For  those  meteorological  phenomena  fitting  into  our 
known  regularity,  its  NWP  could  be  more  accurate,  otherwise  it  is 
hard  to  predict  precisely  for  complex  weather  variation. 


Table  1 

Results  of  grouping  engine. 


Group 

Reference  site  of  NWP  WT  in  the  group 

1 

WT  58# 

16,17,20,55,56,58 

2 

WT  18# 

15,18,69,70,76 

3 

WT  21# 

14,19,21,57 

4 

WT  24# 

23-27,32-36,44,45,54 

5 

WT  63# 

63 

6 

WT  67# 

62,64,65,67,68,74,75 

7 

WT  59# 

22,59,60,82 

8 

WT  116# 

40,116,119 

9 

WT  43# 

28-31,37-39,41-43,46-49,53,112,114 

10 

WT  12# 

12 

11 

WT  73# 

71,73 

12 

WT  87# 

61,80,85,87 

13 

WT  102# 

101,102 

14 

WT  52# 

51-52,103,107,109-111,113,115,117-122 

15 

WT  72# 

66,72,77-79,81,83,84,86 

16 

WT  99# 

88-100 

17 

WT  108# 

9,108 

18 

WT  8# 

1-8,10,11,13 

Note:  Reference  site  of  NWP  indicates  the  location  on  which  the  NWP  data  are 
calculated. 
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■  Root  Mean  Square  Error(m/s)  Correlation  Coefficient 
Fig.  3.  Accuracy  of  wind  speed  from  NWP  for  each  month  in  WF1. 
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■  Root  Mean  Square  Error(m/s)  Correlation  Coefficient 


Fig.  4.  Accuracy  of  wind  speed  from  NWP  for  each  month  in  WF2. 

As  weather  forecast  within  wind  farms  is  the  key  of  wind 
power  prediction  technology  and  the  NWP  accuracy  shows  clear 
monthly  or  seasonal  characteristic,  it  is  possible  to  improve  the 
accuracy  of  wind  power  forecasting  if  models  of  each  month  are 
built.  Furthermore,  RVM  has  the  advantages  of  demand  of  less 
training  samples,  which  facilitates  the  model  training  for  each 
month.  Thus,  candidate  NWP  samples  of  each  month  are  trans¬ 
ferred  to  the  selection  engine  as  model  inputs  and  then  to  build 
prediction  models  according  to  different  error  distribution  of  NWP. 

Since  RVM  is  an  intelligent  learning  method  with  adaption  of 
small  training  samples,  it  is  necessary  and  feasible  to  select  the 
most  beneficial  training  data  for  accurate  prediction.  On  the  one 
hand,  selection  of  training  samples  exerts  a  significant  impact  on 
forecast  accuracy.  A  small  number  of  training  samples  or  samples 
with  large  deviation  usually  causes  that  forecast  engine  learns 
mapping  function  incorrectly  while  a  huge  amount  of  training 
samples  may  be  misleading  for  the  forecast  engine.  On  the  other 
hand,  data  in  a  month  from  wind  farms  are  far  more  than  the 
required  amount  of  training  samples  for  RVM,  because  data 
resolution  of  wind  farms  is  normally  no  more  than  15  min,  and 
there  are  theoretically  around  2880  sets  of  data  per  month. 
Therefore,  how  to  select  the  most  effective  NWP  data  sets  as  the 
training  samples  is  the  key  step  in  predictive  modeling. 

To  test  WF1  (January  2010)  and  WF2  (January  2011),  the 
accuracy  of  NWP  is  classified  according  to  absolute  error  of  NWP 
wind  speed.  For  instance,  <  1.2  m/s  means  higher  NWP  accuracy 
than  <  1.8  m/s  because  only  samples  whose  absolute  error  of 
NWP  wind  speed  is  less  than  1.2  m/s  rather  than  1.8  m/s  are 
selected  as  training  samples.  It  starts  with  a  low  accuracy  level  of 
NWP  and  then  gradually  increases  it.  As  shown  in  Fig.  5,  two  lines 
show  similar  trend  meaning  that  the  accuracy  of  forecasting  model 
increases  with  the  rise  of  NWP  accuracy  at  first.  That  is  because 
that  with  improvement  of  NWP  forecasts,  training  samples  might 
simulate  actual  power  curve  of  wind  turbines  more  accurately  so 
as  to  achieve  higher  wind  power  forecasting  accuracy.  In  this  case 
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of  WF1,  forecasting  accuracy  peaks  at  the  point  of  absolute  error  of 
NWP  wind  speed  <  1.5  m/s,  while  it  is  <  1.6  m/s  in  WF2.  Then, 
forecasting  accuracy  declines  from  the  highest  value  with  further 
descent  of  NWP  Error.  Because  with  further  increase  of  NWP 
accuracy,  elimination  of  too  much  data  leads  to  too  small  training 
samples  so  that  deteriorate  both  the  generalization  and  fitness  of 
forecasting  model  as  well  as  forecast  accuracy. 

This  selection  engine  was  implemented  with  candidate  sam¬ 
ples  of  each  month  in  two  wind  farms  and  the  results  were 
recorded  in  Table  2.  Forecast  accuracy  of  each  month  reflects 
similar  features  as  that  of  January.  Besides,  accuracy  peak  of 
forecasting  model  for  different  months  locates  at  different  abso¬ 
lute  error  of  NWP  wind  speed  (or  NWP  accuracy  level).  Consider¬ 
ing  using  samples  corresponding  to  different  accuracy  level  of 
NWP  results  in  different  accuracy  of  forecasting  model,  selection  of 
training  samples  according  to  NWP  accuracy  level  may  improve 
the  forecast  accuracy. 


3.4.  Optimization  of  kernel  function 


Due  to  the  significant  impact  of  kernel  function  parameters  on 
forecasting  accuracy,  particle  swarm  optimization  (PSO)  has  been 
adopted  to  search  for  the  optimal  kernel  width  and  initial  value  of 
RVM.  In  this  paper,  Gaussian  kernel  function  as  (12)  is  adopted. 


I<(x,Xi )  =  exp 


X-X,|2\ 
~2a^~  ) 


(12) 


where,  a  is  the  width  of  kernel  function. 

There  are  20  particles  for  each  parameter  used  in  this  model 
and  the  adaptive  function  of  them  is  root  mean  square  error.  The 
speed  and  location  of  them  are  updated  by  following  functions: 

V-  y  =  +  Cl  rand()(pbkM-xfd)  +  c2randQ(gbkd-xld )  (13) 


7<+l 
i, 


+  v 


k+\ 


i4 


(14) 


where,  C\  and  c2  are  learning  factors;  rand{ )  is  uniform  random 
number  [0,  1];  and  x\,  are  speed  and  location  of  the  ith 
particles  in  the  kth  iteration  in  d-demention;  pb[d  and  gbfd  are, 
respectively,  the  individual  best  location  and  group  best  location  of 
the  ith  particle  in  d-demention;  m  is  inertia  weight  factor. 

Along  with  the  picked  training  samples  of  one  month  and  its 
corresponding  actual  power  data,  optimized  parameters  from  PSO 


Fig.  5.  Forecasts  accuracy  corresponding  to  different  NWP  error  level. 


engine  are  transferred  to  RVM  engine  whose  structure  is  illu¬ 
strated  in  Fig.  6. 


3.5.  Steps  of  grouping  forecasts 


Application  of  the  ORVM  model  for  wind  power  prediction  can 
be  summarized  as  following  steps: 


(1)  Wind  turbine  grouping  phase: 

Assume  that  M  is  the  size  of  input  samples;  N  denotes  the 
size  of  the  number  of  neural  cell  in  output  layer;  k 
represents  the  neural  cell  number  in  input  layer;  t  is  the 
current  number  of  learning  times;  T  is  the  total  learning 
times. 

a.  To  calculate  the  correlation  of  wind  speed  and  wind 
power  output  of  each  wind  turbine; 

b.  To  input  the  wind  turbine  sites  and  the  correlation  of 
wind  speed  and  wind  power  into  grouping  engine  as 
input  vectors  and  to  normalize  the  input  vectors 
among  the  range  of  [0,1]  as  X  =  {x,;X/ =  (x,i,xi2, ..., 
xik)TeRk,  i  =  1,2, 

c.  To  initialize  the  SOFM  based  grouping  engine: 
connection  weights  W  =  {W/;  Wj  =  (w^ ,  wj2, . . . , 
Wji,...,  wjk)T<E  Rk,  j=  1,2, N}\  learning  pace  a(t); 
neural  cell  neighborhood  Naj(t); 

d.  To  calculate  the  distance  between  connection  weight 
of  neural  cell  and  inputs: 


£  (X,— Wj/)2 
i  =  1 


1/2 


j=  1,2, 


(15) 


e.  To  search  the  winning  neural  cell  which  has  the 
shortest  distance: 

Ed,-*  =  min  {Ed,  }  (16) 

1  <j<N 


f.  To  adjust  the  network  parameters  (connection 
weights  of  the  winning  neural  cell  with  other  neural 
cell)  in  order  to  mapping  the  input  vectors  into  the 


Table  2 

NWP  accuracy  level  for  the  training  sample  selection  of  each  month  in  two  wind  farms. 


(m/s) 

January 

February 

March 

April 

May 

June 

July 

August 

September 

October 

November 

December 

WF1 

<  1.5 

<1.0 

<  1.5 

<  1.5 

<  1.6 

<2.0 

<2.0 

<  1.6 

<  1.2 

<  1.7 

<  1.5 

WF2 

<  1.6 

<  1.5 

<  1.7 

<  1.9 

<  1.6 

<2.0 

<1.2 

<1.2 

<1.5 

<  1.6 

<  1.6 

<  1.7 
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output  layer: 

Wj(t  +  1)  =  Wj(t)  +  jeNOj+it)  (17) 

g.  To  iterate  every  sample  and  to  update  the  learning 
pace  until  t  =  T : 

a(t)  =  a(0)(l-T)  (18) 

h.  Each  input  vector  would  non-linearly  mapping  to 
their  winning  neural  cell  in  output  layer.  The  vectors 
mapping  to  the  same  winning  neural  cell  belong  to 
the  same  category. 

(2)  Training  samples  selection  phase: 

To  classify  the  NWP  accuracy  level  and  to  input  the  overall 
candidate  samples  of  each  month  into  selection  engine  to 
determine  the  most  effective  level.  Samples  selected  by  the 
most  effective  level  were  taken  as  training  samples.  Then, 
to  normalize  selected  samples  into  range  of  0  to  1  for  the 
convenience  of  computation. 

(3)  Model  parameters  optimization  phase: 

To  transfer  the  selected  training  samples  from  phase  (1)  to 
PSO  engine  to  determine  the  most  suitable  kernel  width 
and  initial  value  of  RVM  iteration. 

(4)  Training  and  forecasting  phase: 

a.  To  input  selected  training  samples  and  optimized 
parameters  into  RVM  engine  and  to  initialize  itera¬ 
tion  conditions; 

b.  To  calculate  the  posterior  distribution  over  weights: 

p(w|t,  a,  a2)  =  (2^)-<iV+1/2)|2]|_(1/2)e{-(l/2)(w-//)r^-1(w-A/)} 

(19) 

c.  To  update  the  mean  and  variance  of  posterior 
ju  and  Z,  respectively: 

/u  =  o-~2Z0Tt  (20) 

Z  =  (g~20j0+A)-a  (21) 


reduces  model  complexity  and  computing  costs. 
Model  parameters  aMp,  c2MP  would  be  achieved 
when  model  training  is  completed, 
f.  To  import  test  data  and  model  parameters  aMp,  °mP 
into  forecasting  model.  After  calculation  and  anti¬ 
normalization  process,  prediction  value  and  its  pos¬ 
sible  fluctuation  are  obtained. 

Not  only  could  ORVM  model  provide  an  individual  prediction 
value,  but  also  calculate  variance  of  the  prediction  value  by 
function  (11)  which  is  also  the  possible  prediction  error.  Term  of 
variance  comprises  two  kinds  of  error:  one  is  from  error  of 
estimation  while  another  is  from  uncertainty  of  weights  calcula¬ 
tion.  This  probabilistic  mechanism  for  forecasting  largely  improves 
practical  value  for  risk-resisting  14]. 


4.  Case  study 

4.2.  Data 

The  data  of  two  wind  farms  in  China  include  mean  wind  farm 
output  collected  from  SCADA  and  mean  wind  speed  from  met 
mast  and  numerical  weather  prediction  data.  All  the  data  had  an 
interval  period  of  15  min  covering  running  period  shown  in 
Table  3.  In  WF1,  wind  speed  data  measured  by  each  wind  turbine 
SCADA  are  available  while  it  is  not  available  in  WF2.  It  means  that 
grouping  engine  could  only  be  tested  in  WF1.  Among  the  available 
data,  80%  are  considered  as  candidate  training  samples  and  20% 
test  samples. 

To  evaluate  the  performance  of  proposed  model,  ORVM  models 
are  compared  with  support  vector  machine  (SVM)  and  ANN 
optimized  by  Genetic  Algorithm  (GA-ANN)  in  terms  of  forecast 
accuracy,  model  complexity  and  model  running  time.  For  the  sake 
of  fairness,  all  methods  have  the  same  input  variables,  training 
samples  and  test  samples.  Note  that  ANN'S  training  samples 
contains  12  sets  of  ORVM  training  samples  as  a  whole  due  to  its 
demand  of  a  large  number  of  training  samples  which  means  only 
one  model  established. 


d.  To  precede  iteration  of  function  (22)  and  (23)  until 
the  convergence  condition  is  fulfilled: 


new  _ 
ui 


1  OiZu 


(<x2)new 


t—0/J2 

N-ZiO-atZid 


(22) 

(23) 


where,  m  is  the  ith  mean  of  posterior  from  function 
(20);  Zu  is  the  ith  diagonal  element  of  posterior 
covariance  from  function  (21),  and  computed  by 
a,  a2  from  current  iteration  results;  N  indicates  the 
number  of  sample  data. 

e.  To  delete  those  w,  =  0  in  the  iteration  process. 
The  vector  corresponding  to  remaining  w*  is  termed 
as  ‘Relevance  Vector’.  Consequently,  RVM  largely 


Table  3 

Wind  farm  description. 


WF1 


WF2 


4.2.  Evaluation 


Two  frequently  used  error  criteria  are  adopted  for  numerical 
experiments  of  this  paper.  Root  mean  square  error  (RMSE)  in 
function  (24)  is  computed  for  all  validation  period  and  can  give  a 
better  evaluation  of  prediction  error  over  a  longer  period  [12]. 
Mean  absolute  error  (MAE)  in  function  (25)  is  another  commonly 
used  error  measures  for  prediction  process. 


RMSE  = 


(Pmi-ppo2 

Cap  x  s/n 


(24) 


MAE  =  Z"=il pMi  Ppi\  (25) 

Cap  x  n 

where  PMi  and  PPi  indicate  actual  and  forecast  value  of  wind  power 
output  at  time  of  i;  Cap  denotes  installed  capacity  of  wind  farm;  n 
is  number  of  samples  involved. 


Table  4 

RMSE  comparison  of  single  NWP  forecasts  and  grouping  forecasts  in  WF1. 


Installed  capacity 

183  MW 

201  MW 

ORVM  (%) 

SVM  (%) 

GA-ANN  (%) 

Running  period 

2010  year  except  for  October 

2011.5-2012.6 

WT  number 

122 

134 

Single  NWP 

9.9 

12.5 

13.3 

Location 

Northeast  China 

East-central  China 

Grouping  forecasts 

9.1 

11.4 

12.1 
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Table  5 

Comparisons  of  forecasts  accuracy  for  each  month  in  WF1  (forecasts  with  grouping 
engine). 


Month 

ORVM  model 

SVM  model 

GA-ANN  model 

RMSE 

MAE 

RMSE 

MAE 

RMSE 

MAE 

Average 

0.091 

0.059 

0.114 

0.082 

0.121 

0.091 

January 

0.109 

0.077 

0.112 

0.076 

0.127 

0.094 

February 

0.103 

0.061 

0.126 

0.093 

0.124 

0.099 

March 

0.077 

0.049 

0.121 

0.096 

0.091 

0.061 

April 

0.137 

0.100 

0.139 

0.117 

0.133 

0.090 

May 

0.103 

0.071 

0.175 

0.145 

0.149 

0.115 

June 

0.063 

0.035 

0.056 

0.035 

0.090 

0.069 

July 

0.071 

0.044 

0.071 

0.041 

0.112 

0.084 

August 

0.050 

0.041 

0.064 

0.032 

0.089 

0.06 

September 

0.086 

0.053 

0.098 

0.056 

0.143 

0.103 

November 

0.101 

0.054 

0.170 

0.121 

0.151 

0.138 

December 

0.098 

0.072 

0.123 

0.09 

0.120 

0.092 

Table  6 

Comparisons  of  forecasts 
grouping  engine). 

accuracy 

for  each 

month  in 

WF2  (forecasts  without 

Month 

ORVM  model 

SVM  model 

GA-ANN  model 

RMSE 

MAE 

RMSE 

MAE 

RMSE 

MAE 

Average 

0.119 

0.092 

0.144 

0.104 

0.142 

0.106 

January 

0.140 

0.090 

0.147 

0.092 

0.159 

0.101 

February 

0.137 

0.091 

0.141 

0.097 

0.150 

0.119 

March 

0.142 

0.120 

0.149 

0.125 

0.147 

0.125 

April 

0.151 

0.134 

0.177 

0.1403 

0.162 

0.127 

May 

0.169 

0.146 

0.174 

0.157 

0.185 

0.151 

June 

0.142 

0.111 

0.155 

0.107 

0.169 

0.140 

July 

0.083 

0.05 

0.141 

0.084 

0.128 

0.079 

August 

0.098 

0.073 

0.169 

0.108 

0.148 

0.097 

September 

0.101 

0.081 

0.128 

0.091 

0.114 

0.087 

October 

0.053 

0.038 

0.076 

0.050 

0.075 

0.051 

November 

0.101 

0.070 

0.152 

0.111 

0.148 

0.107 

December 

0.113 

0.103 

0.125 

0.095 

0.127 

0.097 

43.  Analysis  and  discussion 

Table  4  shows  the  grouping  forecasts  of  three  models  to  verify 
the  effectiveness  of  grouping  engine.  Single  NWP  represents  the 
forecasting  using  only  one  set  of  NWP  data  at  met  mast  while 
grouping  forecasting  indicates  the  forecasting  using  several  sets  of 
NWP  data  at  reference  sites  of  each  group  in  Table  1 .  The  results 
show  that  the  grouping  engine  plays  positive  role  on  three 
different  forecasting  methods,  and  improve  the  yearly  average 
RMSE  by  (9.9-9.1%)/9.9%  =  8.08%  for  ORVM,  (12.5-11.4%)/12.5%  = 
8.8%  for  SVM,  (13.3-12.1%)/13.3%  =  9.02%  for  GA-ANN. 

Tables  5  and  6  show  the  full-year  forecasts  accuracy  of  ORVM, 
SVM  and  GA-ANN  in  WF1  and  WF2.  Because  of  the  data  manage¬ 
ment  in  WF2,  wind  speed  data  measured  by  wind  turbines  are 
unavailable.  Consequently,  the  grouping  engine  could  be  tested  in 
WF1  but  not  in  WF2.  That  is  one  of  the  reasons  why  the  performance 
of  ORVM  in  WF1  is  better  than  that  of  in  WF2.  Moreover,  a  little 
worse  NWP  quality  in  WF2  shown  in  Figs.  3  and  4  might  be  another 
reason  for  larger  forecasting  RMSE.  In  general,  RMSE  and  MAE 
defined  in  functions  (24)  and  (25)  of  ORVM  are  considerably  lower 
than  those  of  SVM  and  GA-ANN.  Moreover,  the  average  RMSE  and 
average  MAE  are  less  than  those  of  SVM  by  about  (11.9-9.2%)/ 
11.9%  =  22.68%  and  (8.2  -5.9%)/8.2%  =  28.04%  in  WF1;  (14.4-11.9%)/ 
11.4%= 21.92%  and  (10.4 -9.2%)/10.4%=  11.53%  in  WF2;  revealing 
capability  of  ORVM  model  in  wind  power  prediction. 

To  better  illustrate  the  forecast  trend  and  probabilistic  forecast 
capability  of  the  proposed  model,  forecasting  results  of  four  days 


Table  7 

Comparisons  of  computing  time  and  vector  number  for  each  model. 


Model 

Training 
time  (s) 

Test 
time  (s) 

Number  of 
vectors  involved 

WF1 

ORVM 

192.136 

52.190 

92.52 

SVM 

180.119 

67.529 

116.39 

GA-ANN 

1286.342 

109.327 

WF2 

ORVM 

15.354 

0.962 

69.25 

SVM 

12.008 

1.003 

93.06 

GA-ANN 

513.452 

1.795 

from  different  seasons  are  presented  as  examples  of  each  wind 
farm.  Figs.  7  and  8  show  the  predictive  and  real  value  of  wind  farm 
output  as  well  as  the  range  of  possible  fluctuations  at  90% 
confidence  level.  ORVM  features  in  providing  not  only  a  certain 
predictive  value  but  also  its  uncertainty  analysis,  both  the  upper 
and  lower  limits  of  power  fluctuation.  This  can  provide  more 
scientific  guidance  of  risk  decision  for  both  wind  farm  operators 
and  electric  system  dispatchers. 

Validation  test  of  uncertainty  analysis:  throughout  the  year,  the 
percentage  of  which  real  power  production  locates  within  the 
fluctuation  range  is  89.928%  on  average  in  WF1,  88.31%  in  WF2  at 
90%  confidence  level.  It  proves  that  ORVM  model  can  quantita¬ 
tively  assess  the  wind  power  prediction  uncertainty. 

To  evaluate  efficiency  of  the  proposed  model,  its  computing 
time  (containing  training  and  test  time)  and  vector  number  are 
presented  in  Table  7  as  well  as  those  of  SVM  and  GA-ANN.  Less 
computational  cost  and  few  involved  vectors  reflect  its  efficient 
learning  capacity  and  simple  model  structure.  The  running  time, 
measured  on  a  simple  hardware  set  of  2.79  GHz  Processor  with 
3.12  GB  RAM,  is  completely  acceptable  for  decision-making  of  a 
day-ahead  dispatching  or  even  for  ultra-short  term  operating. 
Note  that  the  training  and  running  time  of  WF1  largely  outnumber 
those  of  WF2  because  the  test  in  WF1  involves  grouping  engine 
meaning  that  more  sets  of  NWP  data  and  more  forecasting  models 
join  in  the  calculation  process. 


5.  Conclusion 

1.  ORVM  models  of  each  month  for  wind  power  grouping  prediction 
have  been  proposed.  Results  of  the  case  study  involving  two  wind 
farms  in  China  prove  that  full-year  average  RMSE  and  MAE  of  the 
proposed  model  are  9.1%  and  5.9%  which  are  lower  than  those  of 
SVM  and  GA-ANN,  respectively.  Furthermore,  the  proposed 
model  effectively  provides  quantitative  assessment  of  forecasting 
uncertainty.  ORVM  model  outperforms  SVM  and  GA-ANN  in 
terms  of  the  wind  power  forecast  accuracy  and  practicality. 

2.  A  grouping  engine  has  been  established  to  divide  wind  turbines 
into  several  groups  for  improving  forecasting  accuracy  and  to 
minimize  NWP  computational  cost  with  the  smoothing  effect. 
The  grouping  engine  can  identify  the  similarity  of  distinctive 
wind  speeds,  WT  power  outputs  and  wind  turbine  sites  at 
different  locations.  And  then  a  NWP  reference  site  of  each 
group  will  be  selected  to  represent  the  general  condition  of  the 
wind  resource  of  each  group.  This  will  help  enhance  the 
representativeness  of  NWP  data  in  a  certain  area  so  as  to 
improve  forecasting  performance. 

3.  A  method  for  selection  of  training  samples  is  presented  con¬ 
sidering  instability  and  seasonal  characteristic  of  NWP  accuracy 
as  well  as  requirement  of  small  training  samples  of  RVM.  This 
method  makes  forecasting  models  more  suitable  to  NWP 
characteristics  of  each  month  so  as  to  significantly  improve 
forecast  accuracy.  Besides,  PSO  has  been  applied  to  search  the 
appropriate  model  parameters  for  different  samples. 


Power  /  kW  O"  Power  /  kW 
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Fig.  7.  Probabilistic  forecast  results  of  WF1  in  24th  May  (a),  29th  July  (b),  25th  Sep.  (c),  26th  Dec.  (d). 
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Fig.  8.  Probabilistic  forecast  results  of  WF2  in  15th  Feb.  (a),  20th  April  (b),  15th  July  (c)  and  17th  Dec.  (d) 
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4.  The  merits  of  RVM-based  model  for  wind  power  prediction  are 

as  follows: 

•  Diverse  selection  of  kernel  function  improves  model  adapt¬ 
ability:  There  is  no  necessity  to  satisfy  Mercer's  condition 
for  kernel  function.  Thus,  RVM-based  model  could  more 
precisely  simulate  power  output  of  different  wind  farms  in  a 
wider  scope; 

•  Probabilistic  prediction:  RVM-based  model  provides  fluc¬ 
tuation  range  of  prediction  at  given  confidence  level  rather 
than  a  certain  value  of  wind  power  prediction; 

•  Fewer  samples  required  in  training  process:  On  the  one 
hand,  due  to  facilitation  of  building  prediction  models  of 
each  month,  it  reflects  characteristic  of  NWP  accuracy 
distribution  more  properly  and  then  enhances  prediction 
capacity.  On  the  other  hand,  it  is  capable  of  building 
prediction  model  for  newly-built  wind  farms  which  have 
less  historical  data; 

•  Sparsity:  Most  relevance  vectors  automatically  tend  to  zero 
during  training  process.  Consequently,  it  has  much  less 
vectors  than  that  of  SVM  in  the  computation.  Moreover, 
number  of  relevance  vectors  would  not  suffer  linear 
increase  along  with  growth  of  size  of  training  samples. 
Because  of  the  focus  on  vectors  only  relating  with  accurate 
prediction,  model  complexity  is  greatly  reduced  and  train¬ 
ing  efficiency  is  improved; 

•  Simplified  parameters  setting:  Different  from  SVM,  only  the 
width  of  kernel  function  being  set  minimizes  subjective 
influence  upon  RVM-based  model  to  the  largest  extent. 
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